-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Prevent Unlimited Agg Recursion with Duplicate Col Names #21066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I have #21001 for the whatsnew. Will merge after tagging 0.23.0 |
Codecov Report
@@ Coverage Diff @@
## master #21066 +/- ##
==========================================
- Coverage 91.83% 91.83% -0.01%
==========================================
Files 153 153
Lines 49495 49497 +2
==========================================
Hits 45454 45454
- Misses 4041 4043 +2
Continue to review full report at Codecov.
|
@@ -5731,7 +5731,12 @@ def diff(self, periods=1, axis=0): | |||
# ---------------------------------------------------------------------- | |||
# Function application | |||
|
|||
def _gotitem(self, key, ndim, subset=None): | |||
def _gotitem(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize we don't have an overall strategy for annotations just yet but I had to think through this as I was debugging anyway, so figured I'd put here explicitly for when we turn this on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
pandas/core/frame.py
Outdated
@@ -5746,9 +5751,11 @@ def _gotitem(self, key, ndim, subset=None): | |||
""" | |||
if subset is None: | |||
subset = self | |||
elif subset.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this line actually hit in tests? return self._constructor here doesn't make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is actually hit by the test case and code that was in place. A Series is a valid value for the subset parameter so I’m forcing it to a DataFrame or else the subsequent slice would fail. All for a better way if you think there’s one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what operation actually hits this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line (which was changed to prevent the unlimited calls):
This would pass a Series before (unless column names were duplicated), though it never raised an error because of the code in _got_item
. It would accept the Series and even use it in a condition, but would always just return a subset of itself...
Definitely convoluted - I think it was inadvertent before the way _got_item
was implemented by DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no the issue is why you are wrapping it with self._constructor
which is a DataFrame
here, then you are selecting it out again, just do
return subset
i think is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm you're probably right with that - I suppose could just return immediately if ndim == 1
. Will try locally and push if it works
FYI, the whasnew is available now. |
lgtm. ping on green. note am happy to simplification / refactoring of code here, though there are a lot of usecases so this is tricky. |
Failed on AppVeyor but does not appear to be related (looks like a UserWarning wasn't being thrown for an io test with the PythonParser?) |
Looks good. I'll keep an eye on the appveyor test in case it's relevant. |
…-dev#21066) (cherry picked from commit d623ffd)
(cherry picked from commit d623ffd)
git diff upstream/master -u -- "*.py" | flake8 --diff
The
_gotitem
implementation forDataFrame
seems a little strange so there may be a more comprehensive approach, but this should prevent the issue for the time beingDidn't add whatsnew yet since none existed for 0.23.1. Happy to add if we are OK with this fix